The Minimum-Entropy Set Cover Problem

نویسندگان

  • Eran Halperin
  • Richard M. Karp
چکیده

We consider the minimum entropy principle for learning data generated by a random source and observed with random noise. In our setting we have a sequence of observations of objects drawn uniformly at random from a population. Each object in the population belongs to one class. We perform an observation for each object which determines that it belongs to one of a given set of classes. Given these observations, we are interested in assigning the most likely class to each of the objects. This scenario is a very natural one that appears in many real life situations. We show that under reasonable assumptions finding the most likely assignment is equivalent to the following variant of the set cover problem. Given a universe U and a collection S = (S1, . . . , Sm) of subsets of U , we wish to find an assignment f : U → S such that u ∈ f(u) and the entropy of the distribution defined by the values |f−1(Si)| is minimized. We show that this problem is NP-hard and that the greedy algorithm for set cover finds a cover with an additive constant error with respect to the optimal cover. This sheds a new light on the behavior of the greedy set cover algorithm. We further enhance the greedy algorithm and show that the problem admits a polynomial time approximation scheme (PTAS). Finally, we demonstrate how this model and the greedy algorithm can be useful in real life scenarios, and in particular, in problems arising naturally in computational biology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Minimum Entropy Submodular Set Cover Problem

We study minimum entropy submodular set cover, a variant of the submodular set cover problem (Wolsey [22], Fujito [11], etc) that generalizes the minimum entropy set cover problem (Halperin and Karp [12], Cardinal et al. [5]) We give a general bound of the approximation performance of the greedy algorithm using an approach that can be interpreted in terms of a particular type of biased network ...

متن کامل

Miminum Entropy Set Cover Problem for Lossy Data Compression

Classical minimum entropy set cover problem relies on the finding the most likely assignment between the set of observations and the given set of their types. The solution is described by such partition of data space which minimizes the entropy of the distribution of types. The problem finds a natural application in the machine learning, clustering and data classification. In this paper we show...

متن کامل

Minimum Entropy Submodular Optimization (and Fairness in Cooperative Games)

We study minimum entropy submodular optimization, a common generalization of the minimum entropy set cover problem, studied earlier by Cardinal et al., and the submodular set cover problem (Wolsey [Wol82], Fujishige [BIKP01], etc). We give a general bound of the approximation performance of the greedy algorithm using an approach that can be interpreted in terms of a particular type of biased ne...

متن کامل

Minimum entropy orientations

We study graph orientations that minimize the entropy of the in-degree sequence. The problem of finding such an orientation is an interesting special case of the minimum entropy set cover problem previously studied by Halperin and Karp (Theoret. Comput. Sci., 2005) and by the current authors (Algorithmica, to appear). We prove that the minimum entropy orientation problem is NP-hard even if the ...

متن کامل

Improved approximation algorithms for low-density instances of the Minimum Entropy Set Cover Problem

We study the approximability of instances of the minimum entropy set cover problem, parameterized by the average frequency of a random element in the covering sets. We analyze an algorithm combining a greedy approach with another one biased towards large sets. The algorithm is controled by the percentage of elements to which we apply the biased approach. The optimal parameter choice has a phase...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004